Predicting Types of Protein-Protein Interactions Using Various Multiple-Instance Learning Algorithms

نویسندگان

  • Hiroshi Yamakawa
  • Koji Maruhashi
  • Yoshio Nakao
چکیده

Analysis of protein-protein interactions (PPI) is an important issue to understand the biological mechanism of a cellular process. Although large volumes of PPI data have been collected, only a few amounts of PPIs have been elucidated at the functional level. It is required to predict functional types of PPIs. A PPI described in existing pathways, as the KEGG pathways, often corresponds to a pair of complexes, each of which is composed of several subunits (proteins). On the other hand, functional annotations, as provided by the Gene Ontology, have been accumulated mainly for simple proteins. It is difficult for a usual supervised learning method to predict PPI types because the relationship between the input variables (annotations for subunits) and the target variable (PPI type) is ambiguous. With regard to this point, we assume that a subunit pair can determine the interaction type between complexes. Intuitively, this assumption means that an interaction between complexes can be reduced to an interaction between a subunit pair across those complexes. Based on this assumption, the PPI type prediction task can be formalized as a problem of the Multiple-Instance Learning (MIL), which is a kind of semi-supervised learning and has been applied to various fields including drug activity estimation: a complex pair with a PPI type is formulated as a labeled bag and a possible subunit pair across that complex pair as an instance. The goal is to predict labels of unseen bags based on labeled bags and the feature vectors of every instance in every bag. To solve that problem, we have already proposed a method called Voting Diverse Density (VDD)[2], which is a weighted voting system based on the Maron’s Diverse Density[1]. This paper compares the method with other MIL algorithms by solving a binary classification version of that problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Types of Protein-Protein Interactions Using a Multiple-Instance Learning Model

We propose a method for predicting types of protein-protein interactions using a multiple-instance learning (MIL) model. Given an interaction type to be predicted, the MIL model was trained using interaction data collected from biological pathways, where positive bags were constructed from interactions between protein complexes of that type, and negative bags from those of other types. In an ex...

متن کامل

Prediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks

Background: Prediction of the protein localization is among the most important issues in the bioinformatics that is used for the prediction of the proteins in the cells and organelles such as mitochondria. In this study, several machine learning algorithms are applied for the prediction of the intracellular protein locations. These algorithms use the features extracted from pro...

متن کامل

In Silico Screening Studies on Methanesulfonamide Derivatives as Dual Hsp27 and Tubulin Inhibitors Using QSAR and Molecular Docking

The expression of heat shock protein 27 (Hsp27) as a chaperone protein, is increased in response to various stress stimuli such as anticancer chemotherapy. This phenomenon can lead to survive of the cells and causes drug resistance. In this study, a series of methanesulfonamide derivatives as dual Hsp27 and tubulin inhibitors in the treatment of cancer were applied to quantitative structure–act...

متن کامل

Learning from Data with Complex Interactions and Ambiguous Labels

In this thesis, we develop and evaluate machine learning algorithms that can learn effectively from data with complex interactions and ambiguous labels. The need for such algorithms is motivated by such problems as protein-protein binding and drug activity prediction. In the first part of the thesis, we focus on the problem of myopia. This problem arises when greedy learning strategies are appl...

متن کامل

Learning of Protein Interaction Network

Protein-protein interactions (PPI) play a key role in determining the outcome of most cellular processes. Correctly identifying and characterizing protein interactions and the networks they comprise is critical for understanding the molecular mechanisms within the cell. Large-scale biological experimental methods can directly and systematically detect the set of interacting proteins within an o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006